Skip to content

KAFKA-20617: add validation for cluster-id in Formatter#22384

Open
gaurav-narula wants to merge 3 commits into
apache:trunkfrom
gaurav-narula:KAFKA-20617
Open

KAFKA-20617: add validation for cluster-id in Formatter#22384
gaurav-narula wants to merge 3 commits into
apache:trunkfrom
gaurav-narula:KAFKA-20617

Conversation

@gaurav-narula
Copy link
Copy Markdown
Contributor

This change adds validation for Uuids in Formatter which is used by kafka-storage.sh which allows users to fail fast before they inadvertently end up using an invalid cluster-id.

This change adds validation for Uuids in Formatter which is used by
`kafka-storage.sh` which allows users to fail fast before they
inadvertently end up using an invalid cluster-id.
@github-actions github-actions Bot added triage PRs from the community kraft small Small PRs labels May 27, 2026
@gnarula
Copy link
Copy Markdown

gnarula commented May 28, 2026

CC: @showuon @fvaleri

Comment on lines +240 to +241
if (clusterId.contains("=")) {
throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our randomUuid method is possible to output the uuid contains = sign:

This will not generate a UUID equal to 0, 1, or one whose string representation starts with a dash ("-")

So is this the expected validation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is possible because Uuid#toString does not use padding and = sign can only be present when padding is enabled

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually covers a real gap. The one in which a CID like igNUVIdeSPO5JCZYFhOh7Q== would pass the validation, but would be returned as igNUVIdeSPO5JCZYFhOh7Q by Uuid.toString(). I would just move it outside the try-catch block as it throws FormatterException directly and would bypass the catch anyway.

Uuid uuid = Uuid.fromString(clusterId);
if (Uuid.RESERVED.contains(uuid)) {
throw new FormatterException("The specified cluster id, " + clusterId + " is reserved");
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also validate the starting - sign.

Copy link
Copy Markdown
Contributor Author

@gaurav-narula gaurav-narula May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are cosmetic improvemenst that've been added over time but have no bearing on correctness. Failing validation on them would hinder migration from older clusters where - was allowed.

The motivation for avoiding - in the beginning was to avoid shell escaping issues when passing cluster id in CLI tools. https://issues.apache.org/jira/browse/KAFKA-13741

Lately, https://issues.apache.org/jira/browse/KAFKA-20072 avoided - altogether to allow easier copy-pasting.

Copy link
Copy Markdown
Contributor

@fvaleri fvaleri May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let's add a comment about this for future reference.

Copy link
Copy Markdown
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaurav-narula good catch. Left some comments.


@ParameterizedTest
@ValueSource(strings = {"unrvTtQISjar0JUWGU/8Pg", "igNUVIdeSPO5JCZYFhOh7Q==", "AAAAAAAAAAAAAAAAAAAAAA", "AAAAAAAAAAAAAAAAAAAAAQ"})
public void testFormatWithInvalidClusterId(String clusterId) throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no corresponding positive test showing that a valid CID passes validation. This would guard against the validation being too aggressive.

}
try {
if (clusterId.contains("=")) {
throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid");
throw new FormatterException("The specified cluster id, " + clusterId + " is invalid: contains padding");

assertEquals(expectedPrefix,
assertThrows(FormatterException.class,
formatter1.formatter::run).
getMessage().substring(0, expectedPrefix.length()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be easier to rewrite like:

assertTrue(message.startsWith(expectedPrefix))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm indifferent here - I followed the convention used in another test in the same class

Comment on lines +240 to +241
if (clusterId.contains("=")) {
throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually covers a real gap. The one in which a CID like igNUVIdeSPO5JCZYFhOh7Q== would pass the validation, but would be returned as igNUVIdeSPO5JCZYFhOh7Q by Uuid.toString(). I would just move it outside the try-catch block as it throws FormatterException directly and would bypass the catch anyway.

Uuid uuid = Uuid.fromString(clusterId);
if (Uuid.RESERVED.contains(uuid)) {
throw new FormatterException("The specified cluster id, " + clusterId + " is reserved");
}
Copy link
Copy Markdown
Contributor

@fvaleri fvaleri May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let's add a comment about this for future reference.

/**
* Validates the correctness of the given cluster id. A valid cluster id is a base64, urlencoded, no padding
* representation of a {@link Uuid}. These checks do not validate the absence of <code>-</code> character as
* {@link Uuid#randomUuid()} avoids them only for convenience reasons.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explicitly mention support for already generated CIDs. Basically it's for historical reasons.

Copy link
Copy Markdown
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@github-actions github-actions Bot removed the triage PRs from the community label May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants